Subtleties in Tolerating Correlated Failures

ثبت نشده
چکیده

High availability is widely accepted as an explicit requirement for distributed storage systems. Tolerating correlated failures is a key issue in achieving high availability in today’s wide-area environments. This paper systematically revisits previously proposed techniques for addressing correlated failures. Using a combination of experimental and mathematical analysis of several real-world failure traces, we debunk four common myths about how to design systems to tolerate such failures. Based on our analysis, we identify a set of design principles that system builders can use to build services that tolerate correlated failures. We show how these lessons can be effectively used by incorporating them into ALI, a distributed read-write storage layer that provides high availability. Our results using ALI on PlanetLab over the past 8 months demonstrate its ability to withstand large correlated failures and meet preconfigured availability targets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Subtleties in Tolerating Correlated Failures in Wide-area Storage Systems

High availability is widely accepted as an explicit requirement for distributed storage systems. Tolerating correlated failures is a key issue in achieving high availability in today’s wide-area environments. This paper systematically revisits previously proposed techniques for addressing correlated failures. Using several real-world failure traces, we qualitatively answer four important questi...

متن کامل

Triple-star: a Coding Scheme with Optimal Encoding Complexity for Tolerating Triple Disk Failures in Raid

Low encoding/decoding complexity is essential for practical storage systems. This paper presents a new Maximum Distance Separable (MDS) array codes, called Triple-Star, for tolerating triple disk failures in Redundant Arrays of Inexpensive Disks (RAID) architecture. Triple-Star is an extension of the double-erasure-correcting Rotarycode and a modification of the generalized triple-erasure-corre...

متن کامل

α-Register

It is well known that in an asynchronous message-passing system, one can emulate an atomic register providing that more than half of the processes are non-faulty. By contrast, when a majority of the processes may fail, simulating atomic register is not possible. This paper investigates weak variants of atomic registers that can be simulated tolerating a majority of processes failures. Specifica...

متن کامل

Security Requirements for Tolerating Security Failures

This paper describes security failure-tolerant requirements, which tolerate the failures of security services that protect applications from security attacks. A security service, such as authentication, confidentiality or integrity security service, can be always broken down as advanced attack skills are coined. There is no security service that is forever secure. This paper describes an approa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005